107 research outputs found

    cTuning.org: novel extensible methodology, framework and public repository to collaboratively address Exascale challenges

    Get PDF
    Designing and optimizing novel computing systems became intolerably complex, ad-hoc, costly and error prone due to an unprecedented number of available tuning choices, and complex interactions between all software and hardware components. I present a novel holistic methodology, extensible infrastructure and public repository (cTuning.org and Collective Mind) to overcome the rising complexity of computer systems by distributing their characterization and optimization among multiple users. This technology effectively combines online auto-tuning, run-time adaptation, data mining and predictive modeling to collaboratively analyze thousands of codelets and datasets, explore large optimization spaces and detect abnormal behavior. It then extrapolates collected knowledge to suggest program optimizations, run-time adaptation scenarios or architecture designs to balance performance, power consumption and other characteristics. This technology has been recently successfully validated and extended in several academic and industrial projects with NCAR, Intel Exascale Lab, IBM and CAPS Entreprise, and we believe that it will be vital for developing future Exascale systems

    Collective Mind, Part II: technical report

    Get PDF
    Nowadays, engineers have to develop software often without even knowing which hardware it will eventually run on in numerous mobile phones, tablets, laptops, data centers, supercomputers and cloud services. Unfortunately, optimizing compilers often fail to produce fast and energy efficient code across all hardware configurations. In this technical report, we present the first to our knowledge practical, collaborative, publicly available and Wikipedia-inspired solution to this problem based on our recent Collective Mind Infrastructure and Repository

    Iterative Compilation and Performance Prediction for Numerical Applications

    Get PDF
    Institute for Computing Systems ArchitectureAs the current rate of improvement in processor performance far exceeds the rate of memory performance, memory latency is the dominant overhead in many performance critical applications. In many cases, automatic compiler-based approaches to improving memory performance are limited and programmers frequently resort to manual optimisation techniques. However, this process is tedious and time-consuming. Furthermore, a diverse range of a rapidly evolving hardware makes the optimisation process even more complex. It is often hard to predict the potential benefits from different optimisations and there are no simple criteria to stop optimisations i.e. when optimal memory performance has been achieved or sufficiently approached. This thesis presents a platform independent optimisation approach for numerical applications based on iterative feedback-directed program restructuring using a new reasonably fast and accurate performance prediction technique for guiding optimisations. New strategies for searching the optimisation space, by means of profiling to find the best possible program variant, have been developed. These strategies have been evaluated using a range of kernels and programs on different platforms and operating systems. A significant performance improvement has been achieved using new approaches when compared to the state-of-the-art native static and platform-specific feedback directed compilers
    • …
    corecore